maximal lottery
Adaptive Preference Aggregation
AI alignment, the challenge of ensuring AI systems act in accordance with human values, has emerged as a critical problem in the development of systems such as foundation models and recommender systems. Still, the current dominant approach, reinforcement learning with human feedback (RLHF) faces known theoretical limitations in aggregating diverse human preferences. Social choice theory provides a framework to aggregate preferences, but was not developed for the multidimensional applications typical of AI. Leveraging insights from a recently published urn process, this work introduces a preference aggregation strategy that adapts to the user's context and that inherits the good properties of the maximal lottery, a Condorcet-consistent solution concept.
Jackpot! Alignment as a Maximal Lottery
Maura-Rivero, Roberto-Rafael, Lanctot, Marc, Visin, Francesco, Larson, Kate
Reinforcement Learning from Human Feedback (RLHF), the standard for aligning Large Language Models (LLMs) with human values, is known to fail to satisfy properties that are intuitively desirable, such as respecting the preferences of the majority \cite{ge2024axioms}. To overcome these issues, we propose the use of a probabilistic Social Choice rule called \emph{maximal lotteries} as a replacement for RLHF. We show that a family of alignment techniques, namely Nash Learning from Human Feedback (NLHF) \cite{munos2023nash} and variants, approximate maximal lottery outcomes and thus inherit its beneficial properties. We confirm experimentally that our proposed methodology handles situations that arise when working with preferences more robustly than standard RLHF, including supporting the preferences of the majority, providing principled ways of handling non-transitivities in the preference data, and robustness to irrelevant alternatives. This results in systems that better incorporate human values and respect human intentions.
- North America > United States > New York (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Government > Voting & Elections (0.47)
- Leisure & Entertainment > Games (0.46)
Evaluating Agents using Social Choice Theory
Lanctot, Marc, Larson, Kate, Bachrach, Yoram, Marris, Luke, Li, Zun, Bhoopchand, Avishkar, Anthony, Thomas, Tanner, Brian, Koop, Anna
We argue that many general evaluation problems can be viewed through the lens of voting theory. Each task is interpreted as a separate voter, which requires only ordinal rankings or pairwise comparisons of agents to produce an overall evaluation. By viewing the aggregator as a social welfare function, we are able to leverage centuries of research in social choice theory to derive principled evaluation frameworks with axiomatic foundations. These evaluations are interpretable and flexible, while avoiding many of the problems currently facing cross-task evaluation. We apply this Voting-as-Evaluation (VasE) framework across multiple settings, including reinforcement learning, large language models, and humans. In practice, we observe that VasE can be more robust than popular evaluation frameworks (Elo and Nash averaging), discovers properties in the evaluation data not evident from scores alone, and can predict outcomes better than Elo in a complex seven-player game. We identify one particular approach, maximal lotteries, that satisfies important consistency properties relevant to evaluation, is computationally efficient (polynomial in the size of the evaluation data), and identifies game-theoretic cycles.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (4 more...)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games > Chess (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Breaking the Metric Voting Distortion Barrier
Charikar, Moses, Ramakrishnan, Prasanna, Wang, Kangning, Wu, Hongxun
We consider the following well studied problem of metric distortion in social choice. Suppose we have an election with $n$ voters and $m$ candidates who lie in a shared metric space. We would like to design a voting rule that chooses a candidate whose average distance to the voters is small. However, instead of having direct access to the distances in the metric space, each voter gives us a ranked list of the candidates in order of distance. Can we design a rule that regardless of the election instance and underlying metric space, chooses a candidate whose cost differs from the true optimum by only a small factor (known as the distortion)? A long line of work culminated in finding deterministic voting rules with metric distortion $3$, which is the best possible for deterministic rules and many other classes of voting rules. However, without any restrictions, there is still a significant gap in our understanding: Even though the best lower bound is substantially lower at $2.112$, the best upper bound is still $3$, which is attained even by simple rules such as Random Dictatorship. Finding a rule that guarantees distortion $3 - \varepsilon$ for some constant $\varepsilon $ has been a major challenge in computational social choice. In this work, we give a rule that guarantees distortion less than $2.753$. To do so we study a handful of voting rules that are new to the problem. One is Maximal Lotteries, a rule based on the Nash equilibrium of a natural zero-sum game which dates back to the 60's. The others are novel rules that can be thought of as hybrids of Random Dictatorship and the Copeland rule. Though none of these rules can beat distortion $3$ alone, a careful randomization between Maximal Lotteries and any of the novel rules can.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > France (0.04)
- Europe > Austria (0.04)
Query Complexity of Tournament Solutions
Dey, Palash (Indian Institute of Science, Bangalore)
A directed graph where there is exactly one edge between every pair of vertices is called a tournament. Finding the “best” set of vertices of a tournament is a well studied problem in social choice theory. A tournament solution takes a tournament as input and outputs a subset of vertices of the input tournament. However, in many applications, for example, choosing the best set of drugs from a given set of drugs, the edges of the tournament are given only implicitly and knowing the orientation of an edge is costly. In such scenarios, we would like to know the best set of vertices (according to some tournament solution) by “querying” as few edges as possible. We, in this paper, precisely study this problem for commonly used tournament solutions: given an oracle access to the edges of a tournament T , find f(T) by querying as few edges as possible, for a tournament solution f. We first show that the set of Condorcet non-losers in a tournament can be found by querying 2n−⌊log n⌋−2 edges only and this is tight in the sense that every algorithm for finding the set of Condorcet non-losers needs to query at least 2n−⌊log n⌋−2 edges in the worst case, where n is the number of vertices in the input tournament. We then move on to study other popular tournament solutions and show that any algorithm for finding the Copeland set, the Slater set, the Markov set, the bipartisan set, the uncovered set, the Banks set, and the top cycle must query Ω(n 2 ) edges in the worst case. On the positive side, we are able to circumvent our strong query complexity lower bound results by proving that, if the size of the top cycle of the input tournament is at most k, then we can find all the tournament solutions mentioned above by querying O(nk + n log n / log(1− 1 / k ) ) edges only.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)